PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

نویسندگان

  • Jian Pei
  • Jiawei Han
  • Behzad Mortazavi-Asl
  • Helen Pinto
  • Qiming Chen
  • Umeshwar Dayal
  • Meichun Hsu
چکیده

Sequential pattern mining is an important data mining problem with broad applications. I t is challenging since one may need to examine a combinatorially explosive number of possible subsequence patterns. Most of the previously developed sequential pattern mining methods follow the methodology of A priori which may substantially reduce the number of combinations to be examined. Howeve6 Apriori still encounters problems when a sequence database is large andor when sequential patterns to be mined are numerous an&or long. In this papel; we propose a novel sequential pattern mining method, called Prefixspan (i.e., Prefix-projected Sequential Ettern_ mining), which explores prejxprojection in sequential pattern mining. Prefixspan mines the complete set of patterns but greatly reduces the efforts of candidate subsequence generation. Moreover; prefi-projection substantially reduces the size of projected databases and leads to efJicient processing. Our performance study shows that Prefixspan outperforms both the Apriori-based GSP algorithm and another recently proposed method; Frees pan, in mining large sequence data bases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PrefixSpan: Mining Sequential Patterns by Prefix- Projected Pattern

Sequential pattern mining discovers frequent subsequences as patterns in a sequence database. Most of the previously developed sequential pattern mining methods, such as GSP, explore a candidate generation-and-test approach [1] to reduce the number of candidates to be examined. However, this approach may not be efficient in mining large sequence databases having numerous patterns and/or long pa...

متن کامل

Mining Constraint-based Multidimensional Frequent Sequential Pattern in Web Logs

In this paper we introduce an efficient strategy for discovering Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Web usage mining consists of three phases, namely preprocessing, pattern discovery, and pattern analysis. This paper describes each of these phases in ...

متن کامل

Efficient Method for Mining Patterns from Highly Similar and Dense Database based on Prefix-Frequent-Items

In recent years, there are a great deal of efforts on sequential pattern mining, but some challenges have not been resolved, such as large search spaces and the ineffectiveness in handling highly similar, dense and long sequences. This paper mainly focuses on how to design some effective search space pruning methods to accelerate the mining process. We present a novel structure, PrefixFrequent-...

متن کامل

Sequential Pattern Mining by Pattern-Growth: Principles and Extensions

Sequential pattern mining is an important data mining problem with broad applications. However, it is also a challenging problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Recent studies have developed two major classes of sequential pattern mining methods: (1) a candidate generation-and-test approach, represented by (i) GSP...

متن کامل

Effective Mining Sequential Pattern by Last Position Induction

Sequence pattern mining is an important research problem because it is the basis of many other applications. Yet how to efficiently implement the mining is difficult due to the inherent characteristic of the problem the large size of the data set. In this paper, by combining SPAM, we propose a new algorithm called LAst Position INduction Sequential PAttern Mining (abbreviated as LAPIN-SPAM), wh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001